Chunyuan Li

My research centers on multimodal intelligence, with a focus on large-scale language and vision training. Key contributions include LLaVA and its model family series, as well as foundational early work such as GroundingDINO, GLIP, GLIGEN, Florence, and Oscar.
My experience includes research roles at xAI, ByteDance, and Microsoft Research, Redmond. I earned my PhD in machine learning from Duke University under the guidance of Prof. Lawrence Carin, where my doctoral research explored deep generative models. I have also served the community as an Area Chair for NeurIPS, ICML, ICLR, EMNLP, TMLR and a Guest Editor of IJCV on ``the promises and dangers of large vision models''.
news
2025 | Grok-3: Visual understanding and Realtime video in voice-mode. |
---|---|
2024 |
Exploring the boundaries of fully open-source VLMs to establish a mature recipe, documented in Blog Series and Github
|
Oct/Nov, 2023 |
LLaVA is upgraded:
|
September 20, 2023 | A 110-page paper is released to share our perspective on multimodal models: ``Multimodal Foundation Models: From Specialists to General-Purpose Assistants''. This is based our CVPR 2023 Tutorial. [Note on Large Multimodal Models] [Slides] [YouTube] [Bilibili] |
June 1, 2023 | LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. NeurIPS 2023 Datasets and Benchmarks Track (Spotlight) |
April 17, 2023 | Visual Instruction Tuning with GPT-4! We release LLaVA, a Large Language-and-Vision Assistant towards multimodal GPT-4 level capabilities. NeurIPS 2023 (Oral Presentation) [Project] [Paper] [Github] [Demo] [Data] [Model] [Scaling Note] |
April 7, 2023 | Instruction Tuning with GPT-4! a "first attempt" to use GPT-4 data for LLM self-instruct tuning. [Paper] [Github] [My Learnings] |
March, 2023 |
CVPR 2023:
|
Feb, 2023 |
CVPR2023 Workshop and Challenge on the 2nd Computer Vision in the Wild (CVinW). For those who are new to this topic, please check out the CVinW Reading List . [Workshop] [SGinW Challenge] [RF100 Challenge] |
Oct 23, 2022 |
ECCV 2022 Workshop and Challenge on the 1st Computer Vision in the Wild (CVinW). Please check out the videos of this event at [YouTube] [BiliBili]. [Workshop] [ICinW Challenge] [ODinW Challenge] |
Oct 17, 2022 | "Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends", A 100-page survey paper in Foundations and Trends® in Computer Graphics and Vision |
Sep 16, 2022 |
NeurIPS 2022: K-LITE (Oral, 1%), ELEVATER and FocalNet. A team effort to push CVinW. ![]() ![]()
|
Mar 25, 2022 |
Upcoming events as a co-organizer:
|
Mar 1, 2022 | CVPR 2022: |
June 17, 2021 | EsViT chieves SoTA 81.3% top-1 on the ImageNet linear probe evaluation, outperforming prior arts with an order magnitude of higher throughput. [GitHub] |